所选项目:红葡萄酒质量


1. 初始化数据


##        X          fixed.acidity   volatile.acidity  citric.acid   
##  Min.   :   1.0   Min.   : 4.60   Min.   :0.1200   Min.   :0.000  
##  1st Qu.: 400.5   1st Qu.: 7.10   1st Qu.:0.3900   1st Qu.:0.090  
##  Median : 800.0   Median : 7.90   Median :0.5200   Median :0.260  
##  Mean   : 800.0   Mean   : 8.32   Mean   :0.5278   Mean   :0.271  
##  3rd Qu.:1199.5   3rd Qu.: 9.20   3rd Qu.:0.6400   3rd Qu.:0.420  
##  Max.   :1599.0   Max.   :15.90   Max.   :1.5800   Max.   :1.000  
##  residual.sugar     chlorides       free.sulfur.dioxide
##  Min.   : 0.900   Min.   :0.01200   Min.   : 1.00      
##  1st Qu.: 1.900   1st Qu.:0.07000   1st Qu.: 7.00      
##  Median : 2.200   Median :0.07900   Median :14.00      
##  Mean   : 2.539   Mean   :0.08747   Mean   :15.87      
##  3rd Qu.: 2.600   3rd Qu.:0.09000   3rd Qu.:21.00      
##  Max.   :15.500   Max.   :0.61100   Max.   :72.00      
##  total.sulfur.dioxide    density             pH          sulphates     
##  Min.   :  6.00       Min.   :0.9901   Min.   :2.740   Min.   :0.3300  
##  1st Qu.: 22.00       1st Qu.:0.9956   1st Qu.:3.210   1st Qu.:0.5500  
##  Median : 38.00       Median :0.9968   Median :3.310   Median :0.6200  
##  Mean   : 46.47       Mean   :0.9967   Mean   :3.311   Mean   :0.6581  
##  3rd Qu.: 62.00       3rd Qu.:0.9978   3rd Qu.:3.400   3rd Qu.:0.7300  
##  Max.   :289.00       Max.   :1.0037   Max.   :4.010   Max.   :2.0000  
##     alcohol         quality     
##  Min.   : 8.40   Min.   :3.000  
##  1st Qu.: 9.50   1st Qu.:5.000  
##  Median :10.20   Median :6.000  
##  Mean   :10.42   Mean   :5.636  
##  3rd Qu.:11.10   3rd Qu.:6.000  
##  Max.   :14.90   Max.   :8.000
## 
## 0.99007  0.9902 0.99064  0.9908 0.99084  0.9912  0.9915 0.99154 0.99157 
##       2       1       2       1       1       1       1       1       1 
##  0.9916 0.99162  0.9917 0.99182 0.99191  0.9921  0.9922 0.99235 0.99236 
##       2       1       1       2       1       1       2       1       1 
##  0.9924 0.99242 0.99252 0.99256 0.99258 0.99264  0.9927  0.9928 0.99286 
##       3       2       1       1       3       1       1       2       1 
##  0.9929 0.99292 0.99294 0.99306 0.99314 0.99316 0.99318  0.9932 0.99322 
##       1       1       2       1       1       2       1       1       1 
## 0.99323 0.99328  0.9933 0.99331 0.99332 0.99334 0.99336  0.9934 0.99341 
##       1       1       1       2       1       1       1       4       1 
## 0.99344 0.99346 0.99348  0.9935 0.99352 0.99354 0.99356 0.99357 0.99358 
##       1       3       1       1       2       2       4       1       3 
##  0.9936 0.99362 0.99364  0.9937 0.99371 0.99374 0.99376 0.99378 0.99379 
##       2       2       1       2       2       2       3       3       1 
##  0.9938 0.99384 0.99385 0.99386 0.99387 0.99388 0.99392 0.99394 0.99395 
##       1       1       1       1       1       2       2       1       1 
## 0.99396 0.99397   0.994 0.99402 0.99408  0.9941 0.99414 0.99416 0.99417 
##       3       1       2       4       3       1       2       1       1 
## 0.99418 0.99419  0.9942 0.99425 0.99426 0.99428  0.9943 0.99434 0.99437 
##       2       2       3       1       1       1       2       1       1 
## 0.99438 0.99439  0.9944 0.99444 0.99448 0.99451 0.99454 0.99456 0.99458 
##       5       1       3       4       4       1       1       1       4 
## 0.99459  0.9946 0.99462 0.99464 0.99467 0.99468  0.9947 0.99471 0.99472 
##       1       5       2       2       2       1       6       3       3 
## 0.99473 0.99474 0.99476 0.99478 0.99479  0.9948 0.99483 0.99484 0.99486 
##       1       1       3       2       1       9       1       3       1 
## 0.99488 0.99489  0.9949 0.99491 0.99492 0.99494 0.99495 0.99496 0.99498 
##       4       3       4       1       2       4       2       1       5 
## 0.99499   0.995 0.99501 0.99502 0.99504 0.99506 0.99508 0.99509  0.9951 
##       1      10       1       2       2       1       3       1       4 
## 0.99512 0.99514 0.99516 0.99517 0.99518 0.99519  0.9952 0.99521 0.99522 
##       2       5       6       1       3       1       9       1       4 
## 0.99523 0.99524 0.99525 0.99526 0.99528 0.99529  0.9953 0.99531 0.99532 
##       1       4       2       2       3       1       4       2       1 
## 0.99533 0.99534 0.99536 0.99538  0.9954 0.99541 0.99542 0.99543 0.99544 
##       1       6       2      11       4       1       1       2       1 
## 0.99545 0.99546 0.99547 0.99549  0.9955 0.99551 0.99552 0.99553 0.99554 
##       3       7       2       2      14       3       5       1       3 
## 0.99555 0.99556 0.99557 0.99558  0.9956 0.99562 0.99564 0.99565 0.99566 
##       1       2       3       3      14       4       2       3       4 
## 0.99568 0.99569  0.9957 0.99572 0.99573 0.99574 0.99575 0.99576 0.99577 
##       4       1       6       9       1       2       2       5       3 
## 0.99578  0.9958 0.99581 0.99582 0.99584 0.99585 0.99586 0.99587 0.99588 
##       3      14       1       1       2       3       6       2       4 
## 0.99589  0.9959 0.99592 0.99593 0.99594 0.99596 0.99598 0.99599   0.996 
##       1      13       4       2       1       2       2       2      13 
## 0.99603 0.99604 0.99605 0.99606 0.99608 0.99609  0.9961 0.99612 0.99613 
##       2       3       3       2       2       1      10       6       4 
## 0.99614 0.99615 0.99616 0.99617 0.99619  0.9962 0.99621 0.99622 0.99623 
##       2       5       7       1       1      28       1       5       2 
## 0.99624 0.99625 0.99627 0.99628 0.99629  0.9963 0.99631 0.99632 0.99633 
##       3       3       3       3       2      15       1       4       4 
## 0.99634 0.99635 0.99636 0.99638 0.99639  0.9964 0.99641 0.99642 0.99643 
##       3       1       5       5       2      25       1       3       1 
## 0.99645 0.99646 0.99647 0.99648 0.99649  0.9965 0.99651 0.99652 0.99654 
##       1       1       2       3       1      11       1       6       2 
## 0.99655 0.99656 0.99658 0.99659  0.9966 0.99661 0.99664 0.99665 0.99666 
##       6       5       1       2      23       1       3       1       3 
## 0.99667 0.99668 0.99669  0.9967 0.99672 0.99674 0.99675 0.99676 0.99677 
##       1       4       2      13       5       2       5       3       2 
## 0.99678  0.9968 0.99682 0.99683 0.99684 0.99685 0.99686 0.99688 0.99689 
##       1      35       2       2       1       8       3       2       4 
##  0.9969 0.99692 0.99693 0.99694 0.99695 0.99697 0.99698 0.99699   0.997 
##      18       4       2       3       1       1       1       1      24 
## 0.99701 0.99702 0.99704 0.99705 0.99706 0.99708 0.99709  0.9971 0.99712 
##       2       4       3       1       2       4       1      13       4 
## 0.99713 0.99714 0.99716 0.99717 0.99718 0.99719  0.9972 0.99721 0.99722 
##       2       2       2       1       3       1      36       1       1 
## 0.99724 0.99725 0.99726 0.99727 0.99728 0.99729  0.9973 0.99732 0.99733 
##       4       1       1       1       3       1      18       3       1 
## 0.99734 0.99735 0.99736 0.99738 0.99739  0.9974 0.99743 0.99744 0.99745 
##       4       6       5       4       1      22       2       2       9 
## 0.99746 0.99747 0.99748  0.9975 0.99752 0.99754 0.99756 0.99758  0.9976 
##       7       2       3       7       1       1       1       1      35 
## 0.99761 0.99764 0.99765 0.99768 0.99769  0.9977 0.99772 0.99774 0.99779 
##       1       1       1       3       2       4       1       5       1 
##  0.9978 0.99782 0.99783 0.99784 0.99785 0.99786 0.99787 0.99788  0.9979 
##      26       2       2       1       1       4       3       2      14 
## 0.99791 0.99796 0.99798   0.998 0.99801 0.99803 0.99808  0.9981 0.99814 
##       1       1       2      29       2       3       1      10       2 
## 0.99815 0.99817 0.99818  0.9982 0.99822 0.99823 0.99824 0.99828  0.9983 
##       2       2       3      23       1       1       3       2       9 
## 0.99832 0.99834 0.99836  0.9984 0.99842 0.99845  0.9985 0.99852 0.99854 
##       1       1       2      20       2       1       3       1       1 
## 0.99855 0.99859  0.9986 0.99864 0.99865  0.9987 0.99878  0.9988 0.99888 
##       2       1      19       1       2      12       1      20       2 
##  0.9989 0.99892   0.999 0.99901  0.9991 0.99914 0.99915 0.99918  0.9992 
##       2       3       8       1      10       3       1       1       7 
## 0.99922 0.99925  0.9993 0.99935 0.99938 0.99939  0.9994  0.9995  0.9996 
##       1       1       4       1       1       1      24       1      12 
## 0.99965  0.9997 0.99974 0.99975 0.99976  0.9998  0.9999       1 1.00005 
##       1       8       1       1       1      10       1      10       2 
##  1.0001 1.00012 1.00015  1.0002 1.00024 1.00025  1.0003  1.0004  1.0006 
##       4       1       2      10       1       1       2       9       6 
##  1.0008   1.001  1.0014  1.0015  1.0018  1.0021  1.0022 1.00242  1.0026 
##       3       6       6       2       1       2       2       2       2 
## 1.00289 1.00315  1.0032 1.00369 
##       1       3       1       2
## 
##   3   4   5   6   7   8 
##  10  53 681 638 199  18
密度变化较为分散;大部分酒的质量集中在5,6左右。在fixed.acidity,volatile.acidity, citric.acid, residual.sugar,chlorides, free.sulfur.dioxide, total.sulfur.dioxide中,最大值都远远超过了75%的数据,而在最小值到三分位之间的数据差异较为平均,不知道这些较大数对最终酒的质量有何种影响。

quality

fixed.acidity

双变量分析

分析散点矩阵的相关性系数:从散点矩阵观察,与quality相关系数比较大的数据有:quality与volatile.acidity的相关系数是-0.391, 与alcohol的相关系数是0.476,另外与sulphates相关系数是0.25。在双变量分析时,重点分析这些变量之间的关系。

从上图大致可看出,约有50%的酒质量集中在5,6之间,另外fixed.acidity近似于最大值16时质量是5,质量为7、8是,fiexed.acidity通常并不太高,大部分在13以下。

quality5-7之间的酒分布比较集中,也可以大致观察到,在quality<4的酒中,有一些volatile.acidity含量较高,而在>7的酒中,volatile.acidity的含量普遍较低一些.

上图大致趋势是,quality和volatile.acidity之间有一个递减关系。

### 在几个等级的酒的分布中,citric.acid分布都比较均匀。从中位数的折线图来看,有一定的递增关系。

## `geom_smooth()` using method = 'gam'

从上图大致分析,fixed.acidity 和density约为线性关系,density随fixed.acidity的增大而减小

从上图分析,fixed.acidity和pH值大约为线性关系,fixed.acidity越大,pH值越小。(这也符合逻辑,pH值就是用来衡量酸碱度的)

## `geom_smooth()` using method = 'gam'

density和alcohol大致为指数递减关系,二者相关系数是0.496

## `geom_smooth()` using method = 'gam'

total.sulfur.dioxide随free.sulfur.dioxide增大而增大,这也是符合逻辑的。总二氧化硫含量和剩余二氧化硫含量之间相关性很大,其他条件不变的情况下,剩余二氧化硫越多,总二氧化硫越多。

  • 二氧化硫在酒中以气体形式存在,它的浓度影响酒的风味。
## Warning: Removed 15 rows containing non-finite values (stat_summary).

## `geom_smooth()` using method = 'gam'

## 
##  4.6  4.7  4.9    5  5.1  5.2  5.3  5.4  5.5  5.6  5.7  5.8  5.9    6  6.1 
##    1    1    1    6    4    6    4    5    1   14    2    4    9   13   16 
##  6.2  6.3  6.4  6.5  6.6  6.7  6.8  6.9    7  7.1  7.2  7.3  7.4  7.5  7.6 
##   20   14   25   17   37   28   46   38   50   57   67   44   44   52   46 
##  7.7  7.8  7.9    8  8.1  8.2  8.3  8.4  8.5  8.6  8.7  8.8  8.9    9  9.1 
##   49   53   42   42   26   45   40   26   19   27   24   34   33   26   29 
##  9.2  9.3  9.4  9.5  9.6  9.7  9.8  9.9   10 10.1 10.2 10.3 10.4 10.5 10.6 
##   16   22   17   14   17    9   15   26   23   10   19   11   21   12   14 
## 10.7 10.8 10.9   11 11.1 11.2 11.3 11.4 11.5 11.6 11.7 11.8 11.9   12 12.1 
##   10   10    8    3    9    5    7    5   13   12    3    3   12    7    1 
## 12.2 12.3 12.4 12.5 12.6 12.7 12.8 12.9   13 13.2 13.3 13.4 13.5 13.7 13.8 
##    4    5    4    7    4    4    5    2    3    3    3    1    1    2    1 
##   14 14.3   15 15.5 15.6 15.9 
##    1    1    2    2    2    1
## 
##    0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09  0.1 0.11 0.12 0.13 0.14 
##  132   33   50   30   29   20   24   22   33   30   35   15   27   18   21 
## 0.15 0.16 0.17 0.18 0.19  0.2 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 
##   19    9   16   22   21   25   33   27   25   51   27   38   20   19   21 
##  0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39  0.4 0.41 0.42 0.43 0.44 
##   30   30   32   25   24   13   20   19   14   28   29   16   29   15   23 
## 0.45 0.46 0.47 0.48 0.49  0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 
##   22   19   18   23   68   20   13   17   14   13   12    8    9    9    8 
##  0.6 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69  0.7 0.71 0.72 0.73 0.74 
##    9    2    1   10    9    7   14    2   11    4    2    1    1    3    4 
## 0.75 0.76 0.78 0.79    1 
##    1    3    1    1    1

多个变量分析

## Warning: Removed 29 rows containing non-finite values (stat_summary).

酸性物质总量与pH值关系很大,但是没有直接显示出对红酒质量的影响。在红酒的几个质量等级中,酸性物质的含量大多在(5,15之间),总pH值都在(3,4)之间.

线性预测模型

##  [1] "X"                    "fixed.acidity"        "volatile.acidity"    
##  [4] "citric.acid"          "residual.sugar"       "chlorides"           
##  [7] "free.sulfur.dioxide"  "total.sulfur.dioxide" "density"             
## [10] "pH"                   "sulphates"            "alcohol"             
## [13] "quality"              "acid"
## Loading required package: lattice
## Loading required package: MASS
## 
## Attaching package: 'memisc'
## The following objects are masked from 'package:stats':
## 
##     contr.sum, contr.treatment, contrasts
## The following object is masked from 'package:base':
## 
##     as.array
## 
## Calls:
## model: lm(formula = I(acid ~ quality), data = wine)
## model2: lm(formula = acid ~ quality + fixed.acidity, data = wine)
## model3: lm(formula = acid ~ quality + fixed.acidity + volatile.acidity, 
##     data = wine)
## model4: lm(formula = acid ~ quality + fixed.acidity + volatile.acidity + 
##     citric.acid, data = wine)
## model5: lm(formula = acid ~ quality + fixed.acidity + volatile.acidity + 
##     citric.acid + alcohol, data = wine)
## model6: lm(formula = acid ~ quality + fixed.acidity + volatile.acidity + 
##     citric.acid + alcohol + sulphates, data = wine)
## 
## ========================================================================================================================================================================================
##                        model         model2         model3                      model4                                  model5                                  model6                  
## ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
##   (Intercept)           7.791***       0.629***      -0.018                                   0.000***                                0.000**                                 0.000**   
##                        (0.322)        (0.031)        (0.033)                                 (0.000)                                 (0.000)                                 (0.000)    
##   quality               0.235***      -0.046***      -0.001                                   0.000*                                  0.000                                   0.000     
##                        (0.056)        (0.005)        (0.004)                                 (0.000)                                 (0.000)                                 (0.000)    
##   fixed.acidity                        1.051***       1.063***                                1.000***                                1.000***                                1.000***  
##                                       (0.002)        (0.002)                                 (0.000)                                 (0.000)                                 (0.000)    
##   volatile.acidity                                    0.556***                                1.000***                                1.000***                                1.000***  
##                                                      (0.019)                                 (0.000)                                 (0.000)                                 (0.000)    
##   citric.acid                                                                                 1.000***                                1.000***                                1.000***  
##                                                                                              (0.000)                                 (0.000)                                 (0.000)    
##   alcohol                                                                                                                             0.000                                   0.000     
##                                                                                                                                      (0.000)                                 (0.000)    
##   sulphates                                                                                                                                                                   0.000     
##                                                                                                                                                                              (0.000)    
## ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
##   R-squared             0.011          0.993          0.996                                   1.000                                   1.000                                   1.000     
##   adj. R-squared        0.010          0.993          0.996                                   1.000                                   1.000                                   1.000     
##   sigma                 1.823          0.151          0.122                                   0.000                                   0.000                                   0.000     
##   F                    17.379     116247.523     118938.541     2674897743917158941414433751040.000     2138865031038253163413986344960.000     1781348751094982461809856872448.000     
##   p                     0.000          0.000          0.000                                   0.000                                   0.000                                   0.000     
##   Log-likelihood    -3228.395        751.025       1092.088                               47989.920                               47990.028                               47990.064     
##   Deviance           5309.616         36.594         23.886                                   0.000                                   0.000                                   0.000     
##   AIC                6462.791      -1494.050      -2174.177                              -95967.839                              -95966.056                              -95964.128     
##   BIC                6478.922      -1472.541      -2147.291                              -95935.577                              -95928.416                              -95921.111     
##   N                  1599           1599           1599                                    1599                                    1599                                    1599         
## ========================================================================================================================================================================================

最终图像

1

2.

3.

反思: